On the computation of Shannon Entropy from Counting Bloom Filters
نویسنده
چکیده
A Bloom filter [Blo70] is used to perform set membership testing without having the actual data in a set available. The test can give false positives (outputs that the tested element is in the set while it is not), but never false negatives (if the test tells the element is not in the set, it definitely is not). The data structure consists of an array, which is initially set to all zeros, and a set of hash functions. The outcome range of the hash functions is exactly as large as the length of the array. To add an element to the structure (i.e., register it as en element of the set), the element is hashed with each of the hash functions and the bits at the indexes which corresponds to the outcome of the hash function applications is set to 1. To test whether an element is in the set, the same hash functions are applied on the element and each of the bits at indexes corresponding to their outcomes is checked. If any of these bits is set to 0, then is concluded that the element was not in the set. This is correct because if it had been there, all bits would have been set to 1 at an earlier point. If all these bits are set to 1, it is concluded that the element might be in the set. Either, the element is really in the set and hence its bits were set to one at an earlier point, or all of these bits were set to 1 by the application of the hash functions on other elements which are in the set. In the latter case we consider the outcome a false positive. A drawback of Bloom filters is that it is impossible to remove any elements from it. One might naively think that one could remove an element by setting all bits corresponding to the outcome of the hash functions to 0. However, this might violate the properties described above in case any of these bits was also set to 1 by any hash function application on any other element which is still in the set. To overcome this issue Li Fan et al. [FCAB00] introduced counting Bloom filters. A counting Bloom filter works like a normal Bloom filter, but instead of maintaining a bit array, an array with integers, initially filled with zeros, is used. When an element is added to the filter, the hash functions are computed and instead of setting the corresponding bits to 1, the count at these indexes is incremented with 1. The test for membership will conclude that the element is not in the set in case the counts at any of the hash function outcome indexes is 0.
منابع مشابه
Estimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...
متن کاملPay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free
For many networking applications, recent data is more significant than older data, motivating the need for sliding window solutions. Various capabilities, such as DDoS detection and load balancing, require insights about multiple metrics including Bloom filters, per-flow counting, count distinct and entropy estimation. In this work, we present a unified construction that solves all the above pr...
متن کاملComparing Binary Iris Biometric Templates Based on Counting Bloom Filters
In this paper a binary biometric comparator based on Counting Bloom filters is introduced. Within the proposed scheme binary biometric feature vectors are analyzed and appropriate bit sequences are mapped to Counting Bloom filters. The comparison of resulting sets of Counting Bloom filters significantly improves the biometric performance of the underlying system. The proposed approach is applie...
متن کاملFast Private Set Operations with SEPIA
Private set operations allow correlation of sensitive data from multiple data owners. Although intensely researched, current solutions still exhibit limited scalability in terms of the supported maximum set size and number of sets. To address these issues, we propose a new approach to private set operations based on a combination of efficient secure multiparty computation and bloom filters, a s...
متن کاملA Preferred Definition of Conditional Rényi Entropy
The Rényi entropy is a generalization of Shannon entropy to a one-parameter family of entropies. Tsallis entropy too is a generalization of Shannon entropy. The measure for Tsallis entropy is non-logarithmic. After the introduction of Shannon entropy , the conditional Shannon entropy was derived and its properties became known. Also, for Tsallis entropy, the conditional entropy was introduced a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.06609 شماره
صفحات -
تاریخ انتشار 2018